DOCUMENT RESUME 



ED 476 430 



TM 034 933 



AUTHOR 

TITLE 

PUB DATE 
NOTE 



PUB TYPE 
EDRS PRICE 
DESCRIPTORS 

IDENTIFIERS 



Mumford, Karen R. ; Ferron, John M.; Hines, Constance V.; 
Hogarty, Kristine Y. ; Kromrey, Jeffery D. 

Factor Retention in Exploratory Factor Analysis: A Comparison 
of Alternative Methods. 

2003-04“00 

70p.; Paper presented at the Annual Meeting of the American 
Educational Research Association (Chicago, IL, April 2.1 “2 5,. 
2003). 

Reports . “ Research (143) -- Speeches/Meeting Papers (150) 

EDRS Price MFOl /PC03 Plus Postage . 

Comparative Analysis; ^Factor Structure; Monte Carlo Methods; 
Simulation 

^Exploratory Factor Analysis; Parallel Analysis (Horn) 



ABSTRACT 

This study compared the effectiveness of 10 methods of 
determining the number of factors to retain in exploratory common factor 
analysis. The 10 methods included the Kaiser rule and a modified Kaiser 
criterion, 3 variations of parallel analysis, 4 regression-based variations 
of the scree procedure, and the minimum average partial procedure . The 
performance of these procedures was evaluated based on the average number of 
factors retained by each method, the proportion of samples retaining the same 
number of factors retained by each method, the proportion of samples 
retaining the same number of factors as the true number of factors in the 
population, and the proportion of samples retaining the same number of 
factors when a particular rule of thumb is applied to the population. The 
performance of the 10 procedures was investigated using Monte Carlo methods 
in which random samples were generated under known and controlled population 
conditions. Results clearly suggest that both the choice of method and the 
design of the factor analytic study play crucial roles in retaining the 
correct number of common factors. In terms of overall accuracy across the 
conditions examined in this study, one of the parallel analysis approaches 
(Montanelli and Humphreys, 1976) provided the* largest proportion of samples 
retaining the . correct value. Conditions under which other approaches may work 
well are discussed. (Contains 16 tables, 18 figures, 25 charts, and 44 
references.) (SLD) 




Reproductions supplied by EDRS are the best that can be made 
from the original document. ' 



TM034933 



Factor Retention 
1 



o 

cn 

Tf 

VO 

Tf 

Q 

w 



Factor Retention in Exploratory Factor Analysis: 
A Comparison of Alternative Methods 



Karen R. Mumf ord 
John M. Ferron 
Constance V. Hines 
Kristine Y. Hogarty 
Jeffrey D. Kromrey 
University of South Florida 



U.S. DEPARTMENT OF EDUCATION 
Oftice of Educational Research and Improvemenl 

EDUCATIONAL RESOURCES INFORMATION 
CENTER (ERIC) 

'o This document has baen reproduced as 
received from the person or organization 
originating it. 



□ Minor changes have been made to 
improve reproduction quality. 



• Points of view or opinions stated in this 
document do not necessarily represent 
official OERl position or policy. 



PERMISSION TO REPRODUCE AND 
DISSEMINATE THIS MATERIAL HAS 
BEEN GRANTED BY 



-K-Mumford 



TO THE EDUCATIONAL RESOURCES 
INFORMATION CENTER (ERIC) 

, 1 



Paper presented at the annual meeting of the American Educational Research Association, April 21 - 25, 
2003, Chicago 




2 



Factor Retention 
2 



Factor Retention in Exploratory Factor Analysis: 

A Comparison of Alternative Methods 
This study extends previous investigations of the effectiveness of various 
procedures for determining the number of factors to retain in exploratory common 
factor analysis. In our previous Monte Carlo study on the quality of factor analytic 
solutions (Perron, Kromrey, Hogarty, Hines, & Mumford, 2002), we found that decision 
rules used to make a determination of the number of factors to retain, in most cases, 
yielded underestimates of the known number of factors in the population. Zwick and 
Velicer (1986) suggest that the determination of the number of factors or components to 
retain is likely to be the most important decision that the researcher makes in the 
conduct of a factor analytic study. Researchers conducting exploratory factor analyses 
in most instances do not know the true number of factors that are expected to underlie 
the data, and may make decisions resulting in the retention of too few or too many 
factors. Since the number-of -factors decision is made prior to the factor rotation stage, 
it subsequently impacts the results of the factor analysis, such as rotated factor patterns, 
factor score estimates, and the interpretability of the factors. Consequently, several 
researchers (e.g., Rummel, 1970; Fava & Velicer, 1996; Wood, Tataryn, & Gorsuch, 1996) 
have emphasized the importance of extracting the correct number of factors for rotation. 
As Turner (1998) and others note, the interpretation of the factors is based on the 
assumption that the researcher has extracted the correct number of factors. 

The problems of underfactoring (i.e., extracting too few factors) and 
overfactoring (i.e., extracting too many factors) have been addressed in the literature 
(see for example, Crawford, 1975; Fava & Velicer, 1992, 1996; Gorsuch, 1983; 

MacCallum, Widaman, Preacher & Hong, 2001; Mosier, 1939; Turner, 1998; Wood, 
Tataryn & Gorusch, 1996; Zwick & Velicer, 1986). In the case of underfactoring, 
retaining too few factors for inclusion in the rotation phase can result in the loss of 
potentially important information (e.g., a substantive factor or factors) in the factor 
solution. With underfactoring, it has been argued, the true factors in a data set cannot 
be accurately portrayed (Cattell, 1958; Comrey, 1978; Fava & Velicer, 1996). Wood et al. 
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(1996) note that when underfactoring occurs, the estimated factors are likely to contain 
considerable error. In their investigation of the consequences of underfactoring in both 
maximum likelihood factor analysis and principal component analysis, Fava and 
Velicer (1996) found "severe degradation of factor score estimates" with underfactoring. 
It was noted that the principal component score degraded less rapidly than the factor 
score within methods. 

Most researchers concur that overfactoring is less a problem than underfactoring. 
Fava and Velicer (1992) and others offer two theoretical justifications that support this 
point of view. The first, they posit, is based on the fact that each subsequent factor 
extracted accounts for less variance than the factor extracted prior to it. The second 
relates to the notion that if too many factors are retained, cifter rotation, it is relatively 
easy to discard trivial factors without changing the substantive factors. This 
notwithstanding, there is evidence to suggest that overfactoring is none-the-less quite 
problematic and should be avoided. With overfactoring, the resultant factor solution 
may include factors that are not interpretable or unlikely to replicate (Zwick & Velicer, 
1986). It has been suggested that overfactoring may result in factor splitting at the 
rotation phase, or when rotating too many oblique factors, high interfactor correlations 
may result or the factor space may collapse (Crawford, 1975). Gorsuch (1983) warns 
that the extraction of too many factors may cause a common factor to be missed. In 
addition, several studies lend support to the notion that overfactoring may result in 
change of the overall factor structure (see for example, Keil & Wrigley, 1960; Howard & 
Gordon, 1963; Dingman, Miller, & Eyman, 1964). 

Wood et al. (1996) found that when overfactoring occurs, the estimated loadings 
for true factors usually contain less error than in the case of underfactoring. Based on 
the findings of their study, these researchers suggest that overfactoring is preferable to 
underfactoring provided that factor splitting is prevented and false factors are 
eventually eliminated (these authors advance methods for handling these two 
conditions). Fava and Velicer (1996) also concluded from their series of studies, that 
underfactoring was a much more severe problem in factor analysis than overfactoring. 
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two successive slopes 
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* Multiple Regression (MR) 

* *=>3,p = >6 

* Based on same principle as CNG 

* Utilizes all the eigenvalues in each comparison 
of the slopes 

* Greatest absolute value of difference between 
the slopes 
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IRegiressoomi-Based liethods (cont’d) 



* t-value index (t) 

* * = >3,p = >6 

* A variation of the MR procedure 

* Slopes of the regression lines are compared 
using the usual formula for the t-test 

* Greatest absolute value of t 
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* Standard Error of Scree (Seg,^ 

* Errors of estimate are calculated using a sequence 
of regression analyses employing a decreasing 
number of eigenvalues. 
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* Minimum Average Partiai (MAP) 

* Based on matrices of partial correlations 

* After each of the factors is partialed out the 
average of the squared partial correlation Is 
calculated 

* Continued until the residual matrix most 
closely resembles an identity matrix 
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* For each condition (i.e., interfactor 
correlations, communality level, k, p) we 
generated a random sample of 10 
population R matrices 

* From each population R matrix generated 
1,000 samples of each size (N per p) 















Example of Population Pattern Rflatrix 

(k=3&p=15) 


















High 


wide 


Low 








JB7 ..01 51 

.87 -.05 .19 

AS .18 .18 

.7i AT .06 

.77 -.05 -.03 

.61 ^03 .58 

ja .61 -.08 

-.01 .69 .08 

-.04 M -.OS 

.11 A3 -.04 

%04 ,78 -U 

.40 .64 .38 

-A6 .10 AS 

-.05 .03 .77 

.48 -.01 .08 


AS -.03 -.03 

A3 .04 -.04 

.74 .49 .13 

AS .57 -.06 

48 -.01 .04 

.18 A7 .08 
.19 .69 -.01 

.16 .68 -.02 
-.01 .66 A 

-.03 -.04 A4 

-.03 .11 ,70 

.48 -.08 .68 

-.04 .69 .87 

.18 -.03 A2 

.16 -.01 42 


A3 -.01 .07 

AS -.03 -.03 

48 -.03 -.02 

44 -.01 .09 

A8 .23 .03 

A6 .24 .34 

.11 .83 -.03 

-.02 AS -.03 

-.02 .48 -.02 

.34 A8 -.03 

.21 -.03 .80 

.04 .00 AS 

,10 -,01 A4 

.16 A3 A3 

.23 .14 48 














17 





CriSeiroa 



* Performance was evaluated based on: 

* Average number of factors retained by each 
method 

* Proportion ofsamplesretainingthe same number 
of factors as the population 

* Proportion of samples retaining the same number 
of factors as the rule of thumb applied to the 
population 



18 



o 



S 



BEST COPY AVAILABLE 



loflyeoc® of Design Factors 



* Considerable variability was evidenced 
across the conditions examined for each 
method. 

* To investigate which design factors were 
associated with the most variability, 
was computed and examined. 




IK 




[mean Number of Factors Retained by N per 
Variable (N:p)fork = 5 





BESTCOPYAVAIUBLE 




(Ulean Number of Factors Retained by Number of 
Variables Per Factor (p:k) for k = 5 





RHean Number of Factors Retained by 
Communality Type for k = 5 




Proportion of Samples Retaining K 
Factors by Tme Number of Factors 



JfiL 






cao lEtow KR 



Meb Md Ml 
Rule of Thumb 



Dh-3 

□ k*5 

□ k-7 



MP KSR a-iSR 



BESTCOPY AVAILABLE 
o 

ERIC 



10 



8 



Proportion of Samples Retaining K Factors 






Proportion of Samples Retaining K Factors 
by Communality Type 




Rute of Thumb 



O 

ERIC 

hiaifaiifftaiTiTaaa 



11 



BEST COPY AVAILABLE 



More variables per factor, larger samples, 
higher communaiity 

Most ruies underestimated true number of 
factors 

PA||h is best overall, but MAP is siightly 
better with small samples (N = 3p) 

Some methods rarely lead to correct 
number of factors 




* No method should be used in isolation 

* The 'art of factor analysis’ 

* Some methods should be avoided 

* What to do about low communaiity and 
smaii samples? Need a new and better 
thumb 



BESTCOPYAVAIUBLE 




12 



10 



Factor Retention 
4 

MacCallum et al. (2001) state "given the serious consequences of underfactoring, 
users of factor analysis should be very rigorous in making the number-of-factors 
decision and should err in the direction of overfactoring when the evidence is 
ambiguous". Wood, et al. (1996), however, caution about the importance of extracting 
the correct number of factors, as they found in their studies that the factor solution with 
the least error is the one that extracts the correct number of factors. 

Methods for Determining Number of Factors 

The factor analysis literature is replete with recommendations regarding the 
most appropriate procedures to employ as well as an array of criteria to consider when 
addressing the number-of-factors problem (e.g., Gorusch, 1983; Hakstian, Rogers, & 
Cattell, 1982; Zwick & Velicer, 1986). According to Horn and Engstrom (1979), after 50 
years of work in the field, no fewer than 50 tests had been invented. There is, however, 
little consistency in the results obtained from applications of these tests across data sets 
and varying conditions. Thus, many applied researchers are often in doubt about the 
efficiency of the procedures they use in seeking to determine reasonable estimates of the 
number of factors that underlie a given data structure. 

Nasser, Benson, and Wisenbaker (2002) note that although researchers 
addressing the number-of-factors to retain problem have advanced numerous 
approaches based on different theoretical rationales (algebraic, statistical, psychometric, 
substantive importance, and interpretability), the most commonly employed methods 
for determining the number of factors in applied research are Kaiser's eigenvalue- 
greater-than-one rule (Kaiser, 1960) and Cattell's visual scree test (Cattell, 1966). Both 
methods are readily available in commonly used statistical software packages, which 
may account for their widespread use. However, several researchers have discussed the 
limitations of these two procedures and have warned against their use in making 
decisions regarding the number of factors to retain. 

In the case of the Kaiser-greater-than-one rule (Kaiser, 1960), factor retention is 
based upon the number of eigenvalues greater than one. More specifically, the rule 
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calls for as many components or factors to be retained as there are eigenvalues greater 
than 1.0. This procedure was initially suggested by Guttman (1954) based on the 
consideration that it provided a lower boimd for the number of common factors that 
underlie a correlation matrix having unities in the main diagonal. Turner (1998) argues 
that the Kaiser rule is flawed as it is based on an assumption that a factor is not 
psychometrically sound if it accoimts for less variance than a single variable does. Cliff 
(1988) suggests that the rule lacks a logical basis as one of the rationales for the rule was 
based on a misapplication of a formula for estimating the reliability of a multi-item test. 
A number of researchers (e.g., Cattell & Jaspers, 1967; Fava & Velicer, 1992; Hakstian, 
Rogers, & Cattell, 1982; Lee & Comrey, 1979; Zwick & Velicer, 1986) have reported the 
use of the Kaiser rule as problematic as it often leads to an overestimate of the number 
of components or factors that underlie the data, thus giving rise to the potential 
problems associated with overfactoring. Others report a tendency to either 
overestimate or imderestimate the number of components or factors depending on the 
conditions present in the data, including whether it is applied to population versus 
sample matrices (see for example. Cliff, 1988; Hakstian, Rogers & Cattell, 1982). 

Cattell's scree test involves identifying the number of factors or components to 
retain based on a visual examination of the graph of eigenvalues plotted on the vertical 
axis and the factor sequence numbers plotted on the horizontal axis. The process 
involves separation of the "scree" of trivial factors from the "cliff" of nontrivial factors 
(Cattell, 1966). A straight edge is placed along the bottom part of the graph where the 
points form an approximately straight line (Cattell & Vogelman, 1977). The points 
above and to the left of the straight line correspond to the nontrivial factors; points on 
or close to the straight line constitute the trivial factors. This process of identifying the 
nontrivial factors and hence, the number of factors to retain, introduces a considerable 
amoimt of subjectivity on the part of the applied researcher. Several researchers have 
noted that issues of reliability often arise in the use of this procedure (e.g., Crawford & 
Koopman, 1979; Zwick & Velicer, 1986). The absence of an obvious break or the 
presence of multiple breaks in the eigenvalue pattern may make it difficult for one to 
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make a decision about the appropriate number of factors to retain (Jurs, Zoski, & 
Mueller, 1993; Nasser, Benson & Wisenbaker, 2002). Some researchers have reported 
reasonable accuracy of the scree test, however, the range of conditions examined in 
their studies were limited (see for example. Tucker, Koopman & Linn, 1969; Cattell & 
Jaspers, 1967; Zwick & Velicer, 1986). Zwick and Velicer (1986) indicated in their study 
of the number of components to retain that the scree test "was generally accurate but 
variable" and recommended against its use as the sole decision method. 

In light of the subjective nature of Cattell's visual scree test, researchers have 
sought to develop more objective variations of the scree test in order to overcome some 
of its limitations. Four regression-based Vciriations have been proposed, namely, the 
Cattell-Nelson-Gorsuch (CNG) test (Gorsuch & Nelson, 1981); and three procedures 
proposed by Zoski and Jurs (1993, 1996): a multiple regression approach (MR), a t-value 
index, cmd the stcmdard error of scree (SEscree)- A recent Monte Carlo study conducted 
by Nasser, Benson, and Wisenbaker (2002) compared the performance of these four 
regression-based methods under a variety of simulated conditions. The researchers 
reported the SEscree procedure to be the most accurate in terms of its performance with 
both correlated and uncorrelated factors. The other three methods were found to 
consistently underestimate or overestimate the number of factors to be retained. These 
researchers noted several limitations of their study, including the ramge of conditions 
examined. They suggested that this investigation should be viewed as cm initial but 
crucial phase in the investigation of the efficiency of regression-based procedures for 
determining the number of factors, as previous work in the field was based solely on a 
few correlation matrices from the published literature. 

Additionally, other methods considered to be more effective than Kaiser's rule 
and Cattell's scree test in estimating the number of factors that underlie a data set have 
been proposed. These include parallel cmalysis (Horn, 1965; Monfemelli & Humphreys, 
1976; Lautenschlager, Lance, & Flaherty, 1989) and a minimum-average partial (MAP) 
correlation procedure proposed by Velicer (1976). 
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Given the need for a more comprehensive study and comparison of the 
effectiveness of methods proposed for determining the number of factors to retain in 
exploratory factor analysis, we undertook an investigation of ten methods that can be 
classified under four broad categories of procedures: the Kaiser criterion; parallel 
analysis; regression-based variations of the scree test, and residual matrix analysis. The 
ten methods examined are briefly described below. 

Description of Methods 

Kaiser Rule 

Eigenvalue Greater Than 1.0 (KSR) . This is perhaps the most commonly used 
method for determining the number of components to retain in principal component 
analysis and is also frequently employed by more than novice researchers in the 
conduct of exploratory common factor analysis. Zwick and Velicer (1986) suggest that 
the popularity of its use may be due to its availability as the default option in many 
statistical packages. The rationale for this method was initially proposed by Guttman 
(1954) and later adapted and popularized by Kaiser (1960). Factors or components with 
eigenvalues greater than 1.0 (i.e., components which evidence at least as much variance 
as one of the original variables) are retained in the analysis and subjected to rotation. As 
noted earlier, several researchers have indicated that Kaiser's eigenvalue-greater-than-1 
rule is problematic (e.g.. Yeomans, & Golden, 1982; Zwick & Velicer, 1986; Wood, 
Tataryn, & Gorusch, 1996) and have advised against its use in factor analysis. We 
include it here because of its common use, and our desire to compare this method with 
more recently developed approaches. 

Ei genvalue Greater Than Average Eigenvalue in Sample (M-KSR). This modification 
of the Kaiser rule was first proposed by Guttman (1954). This method, which may be 
more appropriate than the original KSR for common factor analysis, calls for the 
retention of factors with eigenvalues greater than the average eigenvalue in the sample. 
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Parallel Analysis 

This method was developed by Horn (1965). It involves the comparison of 
eigenvalues of a correlation matrix of p random uncorrelated variables with those of the 
eigenvalues of the correlation matrix of p variables in the actual data set, based on the 
same sample size. Factors of the real data set that have eigenvalues greater than the 
eigenvalue of the corresponding factor in the random data set are retained and 
subjected to rotation. 

Parallel Analysis (PAmh). A regression equation for predicting the eigenvalues of 
random correlation matrices with squared multiple correlations on the main diagonal of 
the random correlation matrix was subsequently developed by Montanelli and 
Humphreys (1976): 

ln(^,) = a,+ 6, \n{N - 1) + c, In {(.5) {k(k - 1)) - (/ - 1)A:} 

where N = number of observations, 

k = number of variables in the correlation matrix, and 
i = eigenvalue sequence number. 

The regression weights (a,, bi, ci) were obtained via simulation methods and are 
provided by Montanelli and Humphreys (1976). 

A decision rule was suggested based on the point at which the plot of sample 
eigenvalues crosses the plot of expected eigenvalues from matrices of random data 
(Montanelli & Humphreys, 1976). Factors above the point at which the two lines cross, 
were retained for rotation. Factors that lie below the point that the plots crossed (i.e., 
those factors whose eigenvalues were less than the eigenvalues of the corresponding 
factors of the random data set) were considered trivial factors and were not retained. 

Humphreys and Montanelli (1975) found the parallel analysis method to provide 
accurate estimates of the number of factors to be retained in common factor analysis. 
Zwick and Vehcer (1986) also reported that within the context of principal components 
analysis, the parallel analysis method yielded consistently accurate estimates of the 
number of components to retain. In fact, they indicated that in their study in which 
they compared five different rules for deterinming the number of components to retain. 
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the parallel analysis method "was typically the ihost accurate method at each level of 
complexity examined." 

Parallel Analysis Close (PAci) . A modification of Horn's parallel analysis 
procedure is to eliminate factors with eigenvalues that are close in magnitude but not 
necessarily less than the magnitude of the corresponding eigenvalues of the random 
data. Such a modified criterion is expected to retain fewer factors than the original Horn 
procedure. For this study, sample eigenvalues within .10 of the random data 
eigenvalues were considered close enough to stop retaining factors. 

Modified Parallel Analysis (PAll). Lautenschlager, Lance and Flaherty (1989) 
proposed a modification to the general regression equation first developed by 
Montanelli and Humphreys (1976) and further refined by Allen and Hubbard (1986) to 
predict eigenvalues of factors resulting from a random data correlation matrix. These 
researchers noted that the Allen and Hubbard's equation did not yield an accurate 
estimate of the first eigenvalue; this in turn directly influenced the accuracy of 
subsequent eigenvalues in the series. They modified the equation proposed by Allen 
and Hubbard (1986) by adding a variables-to-subjects ratio term: 

ln(X.) = a. + 6, \n(N - 1) + c, In {(.5) (/t - / - 1) (/t - / + 2)} + d. ln( V , ) + e\klN) 

The regression weights («;, h, a, di and e,) are provided by Lautenschlager, Lance 
and Flaherty (1989). 

The results of a Monte Carlo study using the newly augmented regression 
equation showed that it served to sigruficantly improve the prediction of the first, and 
consequently subsequent eigenvalues (up to 48) of a random data matrix 
(Lautenschlager, et al., 1989) when compared to eigenvalues generated from the Allen 
and Hubbard (1986) equation. These researchers recommend use of the modified 
regression equation with the added variables-to-subject ratio term in future parallel 
analysis applications. 

Regression-Based Variations of the Visual Scree 
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Each of the methods described in this category is based on Cattell's (1978) 
guidelines for determining the number of common factors to retain. 

Cattell-Nelson-Gorsuch (CNG). In an attempt to find a more objective approach to 
the visual scree test, Gorsuch and Nelson (1983) developed an analytical method using 
multiple linear regression to determine the number of factors to retain. In this 
procedure, the slopes of all possible sets of three adjacent eigenvalues each are 
compared. More specifically, the slope of the first three points (roots) in the eigenvalue 
plot (1, 2, and 3) is compared with the slope of the next three points (4, 5,and 6). Then 
the slope of points 2, 3, and 4 is compared with the slope of points 5, 6,and 7, and so on 
until all points have been incorporated in the comparisons. The number of factors to be 
retained is defined by the point at which the difference between the two successive 
slopes is greatest. 

Multiple Regression (MR). The CNG procedure uses only six eigenvalues (i.e., six 
data points) at a time to determine each pair of slopes for comparison. Zoski and Jurs 
(1993) suggest that this procedure uses only a limited amount of information to 
determine the number of factors to retain. They therefore proposed a multiple linear 
regression approach that would include more data points in computing the slope of the 
regression lines. The MR approach is based on the same principle as the CNG, but it 
utilizes all the eigenvalues in each comparison of pairs of slopes. A series of adjacent 
slopes are obtained and all possible successive padrs of slopes are compared. One slope 
in each paired comparison is based on an increasing number of eigenvalues and 
corresponds with the major factors; the second slope is based on a decreasing number of 
eigenvalues and corresponds with the trivicd (scree) factors. More specifically, the slope 
of the regression line developed from points 1, 2, and 3 in the eigenvalue plot is 
compared to the slope of the regression line developed from points 4, 5, 6, ..., p (where 
p is the number of eigenvalues in the plot). Then, the slope of the regression line for 
points 1, 2, 3, and 4 is compared to the slope for points 5, 6, 7, ..., p, and so on. This 
process continues until all adjacent slopes of all possible successive pairs of regression 
lines are compared. The decision regarding the number of factors to retain corresponds 
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to the point at which the absolute value of the difference between the slopes is the 
greatest. 

t-value index (t). Another procedure examined introduced a variation of the MR 
procedure known as the t-value index. Using the slopes obtained in the MR procedure, 
the slopes of the regression lines are compared using the usual formula for the t-test of 
the difference between slopes. The number of factors to retain is based on the largest 
absolute value of t. 

Standard Error of Scree (SEscree)- Because the CNG, MR, and t-value index are not 
applicable when the number of factors is less than three and/ or the number of variables 
is lesg than six, Zoski and Jurs (1996) developed the standard error of scree (SEscree) 
procedure which is based on the standard error of estimate for a set of points in the plot 
of the eigenvalues. In this procedure, the errors of estimate are calculated using a 
sequence of regression analyses employing a decreasing number of eigenvalues. 

Initially, all eigenvalues are regressed onto their ordinal numbers. The procedure 
continues with subsequent sets of eigenvalues and concludes when the standard error 
of estimate meets the 1 /p criterion. Because the error variance tends to be inversely 
related to sample size, Zoski and Jurs (1996) set the value of 1 / p as the criteria for 
determining the number of factors to retain (i.e., the number of standard errors that 
exceed 1/p is the number of factors to retain). Zoski and Jurs (1996) and Nasser, et al. 
(1996) found SEscree to be more accurate in identifying the number of factors to retain 
than the CNG and multiple regression (MR) procedures. 

Residual Matrix Analysis 

Minimum Average Partial (MAP) Correlation . The Minimum Average Partial 
(MAP) method developed by Velicer (1976) is based on a matrix of partial correlations. 
After each of the factors has been partialed out, the average of the squared partial 
correlations is calculated. When the residual matrix most closely resembles an identity 
matrix, no further factors are extracted and rotated. Using this method, at least two 
variables will have high loadings on each retained factor. Zwick and Velicer (1986) 
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reported the MAP to be one of the two most acciorate methods (the other being parallel 
analysis) in determining the number of components to retain in their study of five 
decision rules in principal components analysis. 

Purpose 

The purpose of this study was to compare the effectiveness of the ten 
aforementioned methods (the Kaiser rule (BCSR), a modified Kaiser criterion (M-KSR), 
three variations of parallel analysis— Montanelli and Humphrey^ s (PAmh), parallel 
analysis-close (PAcl), and Lautenschlager et al. modified parallel analysis (PAll), four 
regression-based variations of the scree procedure (CNG, MR, t, SEso-ee)/ arid the 
minimum-average partial (MAP) procedure) in determining the number of factors to 
retain in exploratory common factor analysis. The performance of these procedures 
was evaluated based on the average number of factors retained by each method, the 
proportion of samples retaining the same number of factors as the true number of 
factors in the population, and the proportion of samples retaining the same number of 
factors when a particular rule of thumb is applied to the population. 

Method 

The performance of the ten procedures were investigated using Monte Carlo 
methods, in which random samples were generated imder Imown and controlled 
population conditions. The population correlation matrices varied with respect to the 
particular aspects of interest. Population correlation matrices were constructed based 
on the methods described by Tucker, Koopman and Linn (1969), which have been used 
in large-scale simulation studies (MacCallum, et al., 1999; Tucker et al., 1969). For 
correlated factors, a generalization of the method was utilized that was described by 
Hong (1999). The true factor loading patterns imderlying these matrices were 
constructed to exhibit relatively clear simple structure. For these population 
correlation matrices, the number of measured variables, the number of common factors, 
the level of communality and the correlation between factors were controlled. Sample 
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correlation matrices were generated from each of the known population correlation 
matrices. Each of the sample matrices was then analyzed using principal axis factor 
extraction. The pattern of eigenvalue magnitude was used to determine the number of 
factors that should be retained. 

Generation of Population Matrices 

In the Monte Carlo study, uncorrelated population correlation matrices were 
generated using the method presented by Tucker, Koopman, and Linn (1969). These 
matrices were generated under the assumption that the conunon factor model holds 
exactly in the population. This method produces population matrices leading to 
relatively clear simple structure and has been used in other simulation studies 
(MacCallum et al. 1999; Mundfrom, Shaw & Ke, 2001; Tucker, Koopman and Linn, 

1969). The population R is generated based on major, minor, and unique factors, 

R = A[ Aj + A 2 A 2 + A 3 A 3 

where A, is the pxk matrix of actual input factor loadings for the major factors, A 2 is the 
matrix of actual input factor loadings for the minor factors, and A 3 is the pxp diagonal 
matrix of actual input factor loadings for the unique factors. The contribution of the 
minor factors ( A 2 ) was set to zero in this study so that the data generation model 
matched a factor analytic model with k conunon factors. The conunon factors were 
specified as orthogonal. 

The process for creating A, starts with the creation of a matrix of conceptual 
input factor loadings. A, . To create A, , the loading of a variable on a randomly 
selected factor, /=! thru k, is set to a value randomly chosen between 0 and k-1, (for a 3 
factor model a,j could be 0, 1, or 2). Next the loading on a randomly selected factor 

from those remaining is set to a value randomly chosen between 0 and k-1- a,j . This 

process continues until a conceptual input factor loading has been chosen for each 
factor, and ensures the sum of the loadings across the factors is k-1. This process is then 
repeated for each of the p variables. 
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The matrix of actual input factor loadings. A, , is then created from the matrix of 
conceptual input factor loadings, A, , through a series of three steps: (1) normal deviates 
are added to introduce error, (2) a skewing function is used to limit negative factor 
loadings, and (3) the matrix is scaled to ensure desired levels of communality. The 
diagonal matrix, Aj , for the unique factors is also scaled to ensure the desired levels of 
communality. The levels of communality Qi that were simulated were high (h ^ for 
each variable drawn randomly from values of .6, .7, and .8), wide {h 2 for each variable 
drawn randomly from values of .2, .3, .4, .5, .6, .7, and .8), or low {h 2 for each variable 
drawn randomly from values of .2, .3, and .4), which are consistent with the levels 
used in other simulation studies. An example matrix of input factor loadings. A, , that 
was created using these methods is presented in Table 1 (with k=3 and p=15) for each 
level of communality. 

To construct population matrices for correlated factors, the above method was 
modified following Hong (1999). More specifically, 

R = JBJ' + A^A^ 

where 

J = [AA,] 

and 

To Yl 
B = 

y r. 

where O is a matrix of correlations among major factors, F is a matrix of correlations 
among minor factors, and Y is a matrix of correlations among major and minor factors. 
The simulations conducted for this study were somewhat simplified since the 
contributions of minor factors (A 2 ,F, Y) were set to zero. 

It should be noted that this method of generating population correlation matrices 
controls the number of measured variables, the number of common factors, and the 
level of communality, but that more than one population correlation matrix can be 
generated having the desired number of items, factors, and communality. The 
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specifications define a general class of population matrices, but the random selection 
involved in the creation of the conceptual input loading matrix, the introduction of 
error, and the random selection of specific communaUty values all influence the 
population correlation matrix that is obtained. 

Monte Carlo Study Design 

The Monte Carlo study included five factors in the design. These factors were (a) 
the number of common factors, k, present in the population (populations were 
simulated with k = 3,5 and 7 factors), (b) the number of variables, p, in the correlation 
matrices (with p = 3% 5*k and 10*k), (c) the level of communality, (high, wide, and low), 
(d) the level of interfactor correlation (with r,y = 0, .3, .5 and a mixed condition with 
interfactor correlations ranging from 0 to .5), and (e) sample size, N (with samples of 3p, 
5p, lOp, 20p, and 40p). These N:p ratios represent values that range from those generally 
considered insufficient to those considered more than adequate (MacCallum et al., 

1999). The factors in the design were crossed with each other, providing a total of 540 
conditions in the Monte Carlo study. 

The research was conducted using SAS/IML version 8.2. Conditions for the study 
were run under Windows 98 and Windows 2000. For each condition investigated, 10 
population matrices were generated. Then for each population matrix, 1,000 samples 
were generated. This provided the opportunity to examine the degree to which the 
results varied across different population matrices within a condition. The use of 10,000 
samples provided adequate precision of estimates of the sampling behavior of the factor 
recovery indices. For example, 10,000 samples provide a maximum 95% confidence 
interval width around an observed proportion that is ± .0098 (Robey & Barcikowski, 
1992). 
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Mean Number of Factors Retained 

The mean number of factors obtained by each method is presented for all 
combinations of k, f, N, and communality in Tables 2-5, where each table contains a 
different level of correlation among the factors. A quick perusal of these tables suggests 
that the mean number of factors being retained differs across methods, and for each 
particular method the closeness between the mean number of factors retained and the 
true number of factors, k, varies across the conditions studied, with larger sample sizes 
and more variables tending to lead to better results. 

To summarize the performance of the methods, a series of box plots was created. 
Each box plot shows the distribution of the mean number of factors retained for a 
particular method. These box plots are presented in Figmes 1-3, for fc = 3, 5, and 7 
factors, respectively. For the three-factor condition (Figure 1), the CNG method has a 
mean number of factors that is very close to 3, the true value, for aU conditions. The 
other methods show somewhat more variability across conditions. Interestingly, if one 
examines the five-factor conditions (Figme 2), the CNG is still tending to suggest 3 
factors, and it continues to suggest three factors even when the true number of factors is 
seven (Figure 3). Thus its effectiveness drops off considerably as the number of factors 
increases. 

Two of the parallel analysis methods (PAmh and PAcl) appear to do relatively 
well in the sense that the mean number of factors retained is close to the true number of 
factors for a relatively large portion of the conditions. Note that when these methods 
produce means that differ from k, the means tend to be smaller, suggesting too few 
factors are being retained. The tendency to have too few factors is also foimd and 
somewhat more prevalent for the PAll, MAP, and KSR methods. The distributions for 
t, SEscree, and M-KSR show the most variability, and are the only methods that tend to 
have an average number of retained factors that exceeds k for many conditions. 

Since the mean number of factors retained differs across conditions for each 
method, co^ was used to determine which design factors were associated with the most 
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variability. These values are presented in Table 6. Note that one would hope that the 
majority of the variation in the mean number of factors retained would be associated 
with k, the true number of factors. The only method for which over 50% of the 
variation was attributable to k was the PAcl method ^ = .61). Figure 4 contains a 
graph showing the mean number of factors retained for each method as a fimction of k. 
As k increases from 3 to 5 to 7 the mean number of factors retained also increases for 
each method except the CNG, which stays level at 3 factors extracted. Ideally the bars 
would reach, but not exceed, values of 3, 5, and 7, and the method that is coming closest 
to this is the M-KSR. 

The © ^ analysis also shows that the mean number of factors retained varies with 
other design factors. Figure 5 shows variation as a fimction of sample size per variable 
{N/p) for each method. This figure displays results for conditions with k = 5, but the 
pattern was consistent across the values of k investigated. With increasing N/p, the 
PAmh/ PAcl, and M-KSR tend to get close to the expected value of 5. With small N/p 
ratios M-KSR tends to retain too many factors, while PAmh and PAcl tend to retain too 
few factors. 

The effect of the number of variables per factor (p/k) is shown in Figure 6. The 
methods tend to have a larger mean number of retained factors when the number of 
variables per factor is greater. When the number of variables per factor is relatively 
small (3), all methods tend to underestimate the number of factors. When number of 
variables per factor is large (10), four methods have a mean number of retained factors 
that is near the true value of 5. These are the PAmh/ PAcl, MAP and KSR methods. 

The effects of communality are depicted in Figure 7. As communality increases 
the mean number of factors retained by the methods tends to get closer to the true value 
of 5. For some methods this implies the mean number of retained factors decreases 
with communality while for others it increases with communality. Phi also appears to 
have some effect on the mean number of factors retained for most methods (Figure 8). 
Generally the mean number of retained factors decreases with increases in the 
correlations among factors. 
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Proportion of Samples Retaining the Same Number of Factors as the Population 

In addition to the me^ number of factors obtained, the extent to which the 
number of factors retained in each sample matrix matched the number of factors 
present in the population correlation matrix was examined. The proportion of 
agreement between the sample matrices and the population matrices (i.e., the 
proportion of samples retaining k factors) is presented for all combinations of k, p, N, 
and communality in Tables 7 - 10, aggregated among each level of inter-factor 
correlation. A review of these tables reveals that the proportion of samples in 
agreement with the population differs considerably across the methods and conditions 
investigated. Generally, larger sample sizes, a greater number of variables per factor, 
high communality levels, and low inter-factor correlations demonstrated better results. 

Due to evidence of large variability across the conditions examined for each 
method, was computed to determine which design factors were associated with the 
most variability in the proportions. Table 11 displays these results. Contrary to the 
expectations about © ^ in relation to the number of factors extracted, it was expected 
that little variation in the proportion of agreement would be associated with k, the true 
number of factors underlying the population correlation matrix. One method for which 
the majority of variation was attributable to k was the CNG method (©^ =.98). In 
addition, a considerable amount of Vcuiation that was attributable to k was the MR 
procedure (©^ =.48). As demonstrated in Figure 9, ask increased, the proportion of 
samples in agreement with the population decreased for all of the methods. Although 
the CNG method demonstrated excellent results for k = 3 matrices, agreement dropped 
dramatically as the number of factors increased. Additionally, for the MR procedure, 
agreement dropped to zero for populations of A: = 5 and k = 7. 

The ©^ analysis also revealed that the proportion of samples in agreement with 
the population varied with other design factors. Figure 10 depicts the variation in 
proportions as a function of sample size per variable (N/p) for each method. With 
increasing N/p, the majority of the methods evidenced an increase in agreement, with 
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the PAmh method exhibiting superior performance. In addition, the SEscree method 
evidenced the greatest improvement in factor retention with increasing N/p. 

The influence of the number of variables per factor (p/k) on the proportion of 
samples in agreement with the population is shown in Figure 11. For the majority of 
the methods, the proportion of samples in agreement with the population increased as 
the number of variables per factor increased, with the PAmh and PAcl methods 
demonstrating exceptional performance. In particular, the MAP procedure showed the 
most dramatic improvements in agreement as the number of variables per factor 
increased. In contrast, the CNG and t procedures demonstrated the opposite trend, 
declining in their agreement with the population as the number of variables per factor 
increased. Interestingly, the SEscree method showed significant improvement in the level 
of agreement as the number of variables per factor increased until the p/k ratio was 5. 

At that point, the proportion of samples in agreement with the population declined. 

The effects of communality on the proportion of samples in agreement with the 
population were also examined. As shown in Figure 12, the majority of the methods 
evidenced that as communality increased, the proportion of samples in agreement with 
the population increased. In contrast, the MR and t methods evidenced a decrease in 
performance with an increase in communality level. Phi also appeared to affect the 
proportion of agreement for almost all methods (Figure 13). With the exception of the 
CNG and MR methods, agreement decreased as the interfactor correlation increased. 
Negligible differences were observed for the CNG and MR procedures as the 
correlation cimong factors increased. 

Proportion of Samples Retaining the Same Number of Factors as the Rule of Thumb 
Applied to the Population 

Lastly, an examination was made of the extent to which the number of factors 
retained by each rule matched the number of factors that would have been retained if 
we had used the rule on the population correlation matrix. The agreement with that 
which we would have seen in the population is presented for all combinations of k, p, N, 
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and communality in Tables 12-15, with each table displaying a different level of inter- 
factor correlation. An examination of these tables suggests that the proportion of 
samples differs considerably across the methods, and for each method this proportion is 
seen to vary across the conditions studied. In general, conditions with larger sample 
sizes and more variables per factor evidenced better results. 

Because the proportions evidenced variability across the conditions examined for 
each method, we computed a series of ©^values to investigate which design factors 
were associated with the most variability in these proportions. These values are 
presented in Table 16. Consistent with the results presented above, with regard to 
agreement between the sample matrices and the population matrices, we would not 
expect a sizeable portion of the variation in agreement to be associated with k, the true 
number of factors underlying the population correlation matrix. The only method for 
which a considerable amount of variation was attributable to k was the MR method 
= .46). Figure 14 illustrates the proportion of samples agreeing with the rule applied 
to the population R-Matrix as a ftmction of k. As k increases, the proportion of samples 
in agreement with the population decreased for most of the methods with the exception 
of MR, which decreased with k = 5 but increased substantially with k = 7. Additionally, 
the CNG remained zero regardless of the number of factors. Finally, for the t 
procedure, agreement dropped quite dramatically as the number of factors increased. 

The ( 0 ^ analysis revealed that the proportion of samples in agreement with the 
population also varied with other design factors. Figure 15 shows variation in 
proportions as a function of sample size per variable (N/p) for each method. With 
increasing N/p, the majority of the methods evidenced an increase in agreement, with 
the non-scree based methods exhibiting superior performance. However, the SEso-ee 
method evidenced the biggest improvement in factor retention with increasing N/p. 

The influence of the number of variables per factor (p/k) is shown in Figure 16. It 
was our expectation that a considerable influence would be associated with p/k. Three 
methods were observed for which a notable amount of variation was attributable to 
p/k; t, PAmh and PAcx (® ^ = -28, eo ^ = .28 and (o ^ = .24 respectively). The non-scree 



Factor Retention 
21 

based methods tended to fare slightly better or remain constant as the number of 
variables per factor increased. The opposite trend was evidenced for the scree-based 
methods, all of which decreased in agreement as the number of variables per factor is 
increased. 

Finally, the effects of communality are depicted in Figure 17. As communality 
increases the proportion of samples agreeing with the population tended to increase for 
the majority of methods examined, with the exception of the PAll method, which 
evidenced a decrease in performance with increased communality. Phi also appeared 
to have some effect on the proportion of agreement for most methods (Figure 18). 
Generally, agreement decreased as the interfactor correlation increased for the parallel 
cmalysis approaches (PAmh, PAcl and PAll) and the methods based on the Kaiser 
criterion (KSR cmd M-KSR). Negligible differences were observed for the MAP and the 
t procedures as the correlation among factors increased, however, the MAP procedure 
outperformed all other methods for the conditions examined. 

Discussion 

This empirical investigation of methods to determine the number of factors to 
retain in common factor cmalysis clearly suggests that both the choice of method and the 
design of the factor analytic study play crucial roles in retaining the correct number of 
common factors. For example, with phenomena characterized by low communality of 
variables, a study designed with few variables per factor and small sample size is 
unlikely to lead to the correct number of factors, regardless of the method selected 
(except for the trivial case of the success of the CNG method when the true number of 
factors is 3). Investigations in such areas should include larger numbers of variables 
(i.e., p = 10k) cmd a very large number of observations (N = 40p). With such a design, the 
SEscree, parallel analysis (both PAmh and PAcl) and the M-KSR methods provided 
nearly 100% accuracy across levels of k and inter-factor correlation. Phenomena that are 
characterized by higher levels of commimality appear to have less stringent data 
requirements, especially if the correlation between factors is low. 
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The most prevalent type of error made in the number-of-factors problem appears 
to be one of underfactoring if the parallel analysis methods (PAmh, PAcl/ PAll)/ the 
MAP or the Kaiser methods (KSR, M-KSR) methods are used. Turner (1998) also fotmd 
that under some circumstances, parallel analysis may vmderestimate the number of 
factors that are in the data. For the methods based on fitting regression lines to the scree 
plot, underfactoring was also characteristic of CNG and MR, while the t procedure 
showed a tendency towards overfactoring (especially as the true number of factors 
increased). The SEscree method appeared to offer a balance between overfactoring and 
underfactoring, although it showed a tendency to estimate an exceptionally large 
number of factors in conditions with small sample sizes. 

In terms of overall accuracy across the conditions examined in this study, the 
PAmh approach to parallel analysis provided the largest proportion of samples 
retaining the correct value of k, an advantage seen across levels of communality, inter- 
factor correlation and values of k (except for fc = 3, in which the CNG approach was the 
most accurate). Across sample sizes, the PAmh method was the most accurate except for 
the smallest samples examined (N = 3p), in which the MAP evidenced a slight 
advantage. Similarly, for values of p/k, the PAmh was the most accurate except for the 
smallest value (p/k = 3) in which the CNG provided the highest average accuracy. 
Interestingly, the more recent, modified prediction equations provided by 
Lautenschlager, Lance and Flaherty (1989) substantially reduced the accuracy in 
number-of-factors determination. These results, of course, must be considered in the 
light of the limitations of this research. Although the simulation approach we have 
followed has examined a range of values for p, k and N, and a variety of communality 
levels and levels of inter-factor correlation, further research is recommended to extend 
these findings. Additional work is also needed in the development of methods for 
accurately determining the number of factors in the more challenging conditions 
revealed here (e.g., low commxmality, few variables per factor and small sample sizes), 
conditions in which none of these rules of thumb were successful. 
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Such additional work is important because exploratory factor analysis is a 
frequently used multivariate technique in educational research. In the conduct of actual 
exploratory factor analyses, researchers face critical decisions regarding the number of 
factors to extract and retain. The interpretations gleaned from factor analytic solutions 
depend in large part upon the appropriate use of factor retention strategies. As 
exploratory factor analysis continues to enjoy a prominent position among the currently 
available multivariate methods, researchers must remain mindful of the limitations of 
certain procedures and methods. The results of this study vmderscore the need to 
exercise caution in factor retention decisions and highlight the need to consider, in the 
planning stages, those aspects of the factor solution that are important in the research 
application. This research furnishes valuable information about the sensitivity of 
commonly employed methods and provides guidance regarding the choice of 
alternative strategies. 
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Table 1 

Example Population Pattern Matrices as a Frmction of Communalitv when k=3 and p=15 . 



High Conmumalitv Wide Coimmmality Low Coimmmality 



.87 


-.01 


.21 


.89 


-.03 


-.03 


.63 


-.01 


.07 


.87 


-.05 


.19 


.83 


.04 


-.04 


.55 


-.02 


-.03 


.86 


.18 


.18 


.74 


.49 


.13 


.45 


-.02 


-.02 


.76 


.47 


.06 


.69 


.57 


-.05 


.44 


-.01 


.09 


.77 


-.03 


-.03 


.45 


-.01 


.04 


.38 


.23 


.03 


.61 


-.02 


.58 


.18 


.87 


.08 


.36 


.24 


.34 


.58 


.51 


-.06 


.16 


.69 


-.01 


.11 


.62 


-.03 


-.01 


.89 


.08 


.15 


.69 


-.02 

\ 


-.02 


.55 


-.02 


-.04 


.84 


-.03 


-.01 


.56 


.3 


-.02 


.45 


-.02 


.16 


.82 


-.04 


-.03 


-.04 


.84 


.24 


.38 


-.03 


-.04 


.78 


.30 


-.03 


.11 


.70 


.21 


-.03 


.60 


.40 


.64 


.36 


.48 


-.06 


.68 


.04 


.00 


.55 


-.06 


.10 


.89 


-.04 


.59 


.67 


.10 


-.01 


.54 


-.03 


.03 


.77 


.18 


-.03 


.52 


.15 


.03 


.53 


.48 


-.01 


.69 


.16 


-.01 


.42 


.23 


.14 


.48 



Note. Largest loading for each variable is bolded. 
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Table 7 

Proportion of Samples Retaining K Factors, Phi = 0.00 

Low Comnmnality Wide Communality High Communal ity 
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BEST COPY AVAILABLE 



Table 10 

Proportion of Samples Retaining K Factors, Phi = Mixed 
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Table 16 

Omega Squared for Proportion of Samples Agreeing with Rule Applied to Population R>Matrix . 
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Figxire 1. Distributions of Mean Number of Factors Retained, k = 
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Figure 2. Distributions of Mean Number of Factors Retained, k = 
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Figure 3. Distributions of Mean Number of Factors Retained, k = 



Figure 4. Mean Number of Factors Retained by True Number of Factors 
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Figure 5. Mean Number of Factors Retained by N per Variable (N:p) for k = 
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Figure 6. Mean Number of Factors Retained by Number of Variables Per Factor (p:k) for k = 
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Number of Variables Per Factor 



Figure 7. Mean Number of Factors Retained by Coirununality Type for k = 
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Figure 8. Mean Number of Factors Retained by Phi for k = 
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Figure 9. Proportion of Samples Retaining K Factors by True Number of Factors 
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Figure 10. Proportion of Samples Retaining K Factors by N per Variable (N;p) 
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Figure 11. Proportion of Samples Retaining K Factors by Number of Variables Per Factor (p:k) 
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Figure 12. Proportion of Samples Retaining K Factors by Communality Type 
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Figure 13. Proportion of Samples Retaining K Factors by Phi 
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Figure 14. Proportion of Samples Agreeing with Rule Applied to Population R-Matrix by True Number of Factors 
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Figure 15. Proportion of Samples Agreeing with Rule Applied to Population R-Matrix by N per Variable (N:p) 
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Figure 16. Proportion of Samples Agreeing with Rule Applied to Population R-Matrix by Number of Variables Per Factor (p:k) 
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Figure 17. Proportion of Samples Agreeing with Rule Applied to Population R-Matrix by Communality Typi 
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Figure 18. Proportion of Samples Agreeing with Rule Applied to Population R-Matrix by Phi 
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